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ABSTRACT 

Speech synthesis is a technology used in many different areas in computer science. This technology can bring a 
solution to reading activity of visually impaired people due to its text to speech conversion. Based on this problem, 
in this study, a system is designed needed for a visually impaired person to make use of all the library facilities in 
Sakarya University. Certain number of books in the library is transferred in digital media via a scanner and they are 
transferred into their own server. A visually impaired person can use the system with the help of sound orientation of 
the program and the keyboard commands of the user. System will be developed according to new requests. So, the 
purpose of this study is to bring a solution to the social problem of visually impaired people. 

INTRODUCTION 

Speech is one of the methods for providing communication between people. The production process of human 
speech by an external computer or device according to the phonetic expansion of text or message is called 
synthezing (DUTOIT, 1997). Speech synthesis can be done by adding audio tracks to each other which is stored in 
the audio recording database. Phonemes r systems that use audio tracks as phoneme binaries have the opportunity 
to synthezing all kinds of words in a small amount of record using Lego logic (DUTOIT, 1996). However, these 
kinds of synthesis systems are very poor for intelligibility and naturalness. In this regard, the unit selection systems 
that use longer pieces of audio are used more widely today (KOMINEK, 2003) (J.ZHANG, 2004). 

Turkish studies are still limited even though very large numbers of systems are developed for western languages. 
MBROLA (DUTOIT, 1996), FESTIVAL (DUBUISSON, 2009), MULTEXT (VERONIS, 1994), GENGLISH 
(DUTOIT, 2005), HTS (YAMAGISHI, 2007) have been developed for synthesizing more than one language. From 
these systems MBROLA is adapted for Turkish and a working system is developed (BOZKURT, B., 2001). 

In this study, triple sounds which is the most frequently used in Turkish and an additive synthesis system which is 
developed by using double voices that were not covered by triple sounds is planned to use (YURTAY, 2010). This 
system is a simple system that works by taking string data in digital media and adding sound pieces in sound 
database as Lego and it is developed in Turkish-based. 

In the study of TUBITAK, frequently mentioned 3000 triple voices are determined and it is seen that these voices 
represent Turkish 90%. By adding 383 double voices to the list which do not exist in the triple voices, a database is 
developed that is formed totally from 3383 number of pieces. (BICIL, 2010). 

As is known, today visually impaired people cannot go to libraries to read books and they are deprived from this 
social activity except books read by very little number of volunteers or Braille books. In this study, solutions are 
developed with the help of technology to overcome these shortcomings. 

Many studies have been done to facilitate the social lives of visually impaired people, provide their training and 
ensure their happiness. Chen, C. And Lin, S.Y. (2011) evaluated the effects of rope jump exercise on the visually 
impaired students and determined a difference in the flexibility and aerobic capacity for them. Vervaart, E., Janssen, 
N.M., Vervloed, M.RJ. (2005) have worked on a procedure is called in-sight that has been developed to screen 
higher levels of visual functioning related to educational process for around twelve years old visually impaired 
children. Fetton,E.A., Blenkhom, R (1986) have mentioned technological development about educational 
implication of communication and necessary to meet communication needs of visual impaired in different 
environments. Simsek, O. Altun, E., Ates, A. (2010) have talked about the difficulties experienced by visually 
impaired learner during developing information and communication technologies skills and they have suggested 
regulation for these people to develop their skills. Sacks,S Gaylord-Ross, R. (1989) have studied about comparison 
of peer-mediated and teacher-directed training packages for upgrading aspects of variety of social behaviours for 
visual impaired students. Bayir, S., Keser, H., Numanoglu, G.(2010) have researched that through the computer 
literacy trainings, freedom is provided for visual impaired in Turkey. In the study by Lisi, F. (2005), some 


Copyright © The Turkish Online Journal of Educational Technology 


255 




TOJET: The Turkish Online Journal of Educational Technology - October 2011, volume 10 Issue 4 


methodologies used to encourage visually impaired at social integration in the world of work. Through these 
methods, they are seen more successful integrated. Owsley,C., McGwin, G., Philips, J.M., McBeal, S.F., Stalvey, 
B.T. (2004) have studied over an educational program that allows to reduce rates of accidents caused by older 
drivers who have visual acuity deficit or slowed visual processing speed or both of them after a certain age. 

In the system design, the visually impaired person who wants to take advantages of the library services after arrival 
in the library is directed to the designed system by a librarian. It is aimed to ensure the new book requests, book 
search and reading a found book from a requested page number with voice guidance and keyboard commands done 
by the visually impaired person. 

It can be said that, visually impaired people can easily use libraries with the help of this system. The design of the 
system is fully applicable and after applying the processes mentioned in Section 2 and 3, it is planned to dedicate 
the system automatically and with the support of very few people to visually impaired people. The processes in 
implementation and application of the system can be examined in two main topics: Preliminary Processes and 
Application Stage. 

PRELIMINARY PROCESSES 
Hardware 

In the design of the proposed system, a server and a computer with a minimum 2.53 Ghz processor, 4GB DDR2 
Ram, 200GB hard disk are needed. Using the existing server to store books in digital media in the Sakarya 
University library is considered. The number of computers is limited to one as the initial number and then can be 
increased depending on the ratio of users. 

Library staff must be convinced of using the system as an active and reliable way. Furthermore the system can also 
be used by visually impaired users who did not before. For this reason, the need for a monitor and a mouse 
appeared. 

During the system work, visually impaired user will direct the system with an input device. At this stage, a choice 
must be done between two important input devices. These devices are a keyboard and a microphone. They have 
advantages and disadvantages among each other. In this sense, if the keyboard is selected by visually impaired 
person who knows to use the keyboard, it is seen to be more efficient and reliable. In the case of visually impaired 
person who selects to use the microphone eliminates the requirements of using the keyboard and even without 
using their hands he/she can manage the system. But today’s speech recognition technology efficiency, most of the 
library environment is not completely isolated from sound and most of visually impaired person can use the 
keyboard. Because of all these reasons the keyboard will be preferred in this study. In addition, a scanner is required 
for digitization of printed documents in the library. 

Software 

In addition to serve for visually impaired person, the system must have the software infrastructures that must be 
compatible with libraries own automation systems. Thus, the proposed system and library hardware will be used 
more efficiently and they can be used like other computers in the library. 

Paid or free software can be selected to use in the speech synthesis module. However in this proposed system, 
speech synthesis module that we have developed before will be used (YUCEL, 2010). 

Most important parts of the system are voice guidance and management parts of the program used by the visually 
impaired person. At this stage, the developed software will guide the visually impaired person vocally and then 
management will be provided as a result of commands taken from the keyboard. A scanner will be used during the 
digitization of the printed documents and books in the library. While scanning the papers of the relevant document, 
they are converted to image format and an OCR (Optical Character Recognition) system to translate the photos into 
text format is needed. There is much commercial software developed to translate printed documents into digital 
media like Fine Reader, Readlris, etc. For example, if Fine Reader is preferred, we can adapt easily the program to 
own developed system by using APIs presented. System requirements of this software can be summarized that 128 
MB, 16 GB RAM for every additional processors (in case of multi- processors system), 250 MB empty disk space 
for typical program installation, 100 MB empty disk space for running the program, %100 TWAIN compatible 
scanner, digital camera or graphics card and a graphics unit (at least 800x600 resolution) with fax 
modem, (http ://www. abbyy.com/sdk/,2011). 
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PREPARATION STAGE 

In this stage, converting specific printed documents which do not exist in digital media into digital media process is 
done. The resources in the library can be simply separated into two groups: 

■ Digital Resources 

■ Non-digital Sources 

Master’s and doctoral theses in the library are in digital resources group. Copies of these documents in Acrobat 
Reader or other formats can be found usually in the digital medium. All of them are accessible for users. So, these 
sources can be transferred into the used system rapidly. 

Non-Digital Resources are the only the resources referred to as ink printing in the library. Stories, novels, 
magazines, newspapers can be considered as examples of these resources. Speech synthesis can be done to the 
resources by translating them into digital medium with the help of a software support. In this sense, the system can 
be dedicated to the visually impaired users. The problem of resources that do not exist in the digital media (non¬ 
digital resources) is necessity of digitization. 

Designed system aims to do this job as static firstly and then dynamically. At first stage, most popular books in the 
library by selecting the first 500 of them are planned for digitization with the help of hardware and software 
support. Transferred sources will be stored in PDF format. However, one or more people are needed to select books 
and then transfer in the designed system. 

In this study, Microsoft MS-Project program is used to define project’s activity, distribution of resource-task. 
Project is analyzed under the heading of scope, analysis/hardware/software requirements determination, design, 
development, test stage, documentation, application, dissemination and last revisions. Project’s scope determination 
takes 3,5 days, analysis/hardware/software requirements determination takes 12,5 days, design of suitable and 
functional environment for the library and obtaining permits takes 7,5 days, development part includes supplying 
the using software and integrating this to the system takes 30 days, testing of the system takes 4 days, training 
process takes 6 days, preparing the help documentation takes 18 days, application takes 7 days, dissemination 
process takes 3 days and last revisions take 3 days. As a result, the estimated time opening the system to use was 
calculated and found 94,5 days approximately. Project management designed by using MS-Project program is 
shown in Figure 1. 

Many people work in the designed project. They are management part for determining the scope, project manager 
for resource assignments, choice of software/hardware and following the project, analyst for the design of suitable 
and functional environment for the library and obtaining permits, developments for the software, tester for testing 
the system, trainer, technical service for the documentation process, and distribution team for the user opinions 
work. Resource assignments are made using MS-Project program is shown in Figure 2. 



Figure 1: MS-Project work breakdown structure 
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Figure 2: MS-Project resource assignments 


APPLICATION STAGE 

After preparation stage, while the designed system is continuing to serve to visual impaired people; it will continue 
to evolve according to their wishes. In this stage, system will not need an active care like in preparation stage. Only 
the new requested ink printing books will continue to transfer to system. In this way, the system will run more 
efficiently. 

PERFORMANCE OF THE SYSTEM 

The synthesis duration of TTS module in performed with hardware requirements is shown in Table 1 and Table 2 
below. 


Table 1: Test data for our TTS Module 


Text No 

Text 

1 

Istanbul 

2 

Cehalet erdemdir. 

3 

Yaraticiligin yiizde doksam terlemektir. 

4 

Sanatsiz kalmi§ bir milletin hayat damarlanndan biri kopmu§ demektir. 

5 

Metinden Konu§ma Sentezi qoklu-katman i§lemleri iqerir. Onemli bir on katman metinin tamamen 
harflerine aynlarak “normalize 44 edilmesidir; kisaltmalann tarn metin kar§ihklanyla degi§tirilmesi, tire 
ve belirsiz noktalamamn temizlenmesi, sayilann harflere dokiilmesi ve aksanlann uygun sembollerle 
degi§tirilmesini iqerir. Bu on i§lem dil bagimlidir ve her dil iqin gramerine, yazili§ma ve sozlugiine 
dayanan ozelle§mi§ kurallar gerekir. MTRD bu amagla Tiirkqe igin bir on-i§lemci geli§tirmi§tir. 

6 

Toplumumuzun ya§am kalitesinin artmasina ve ulkemizin surdurulebilir geli§mesine hizmet eden, 
bilim ve teknoloji alanlannda yenilikqi, yonlendirici, katilimci ve payla§imci bir kurum olma 
vizyonunu benimseyen TUBITAK, akademik ve endustriyel ara§tirma geli§tirme qali^malanm ve 
yenilikleri desteklemek, ulusal oncelikler dogrultusunda Ara§tirma-Teknoloji-Geli§tirme gali§masi 
yuriiten Ar-Ge enstitulerini i§letme i§levlerinin yam sira, ulkemizin Bilim ve Teknoloji politikalanm 
belirlemekte ve toplumun her kesiminde bu farkindaligi artirmak iizere kitaplar ve dergiler yay- 
mlamaktadir. 

7 

Ozglirluk ve bagimsizlik benim karakterimdir. Ben milletimin en buyuk ve ecdadimin en degerli 
mirasi olan bagimsizlik a§ki ile dolu bir adamim. ^ocuklugumdan bugiine kadar ailevi, hususi ve 
resmi hayatimin her safhasim yakmdan bilenler bu a§kim malumdur. Bence bir millete §erefin, haysi- 
yetin, namusun ve insanligin viicut ve beka bulabilmesi mutlaka o milletin ozglirluk ve bagimsizligina 
sahip olmasiyla kaimdir. Ben §ahsen bu saydigim vasiflara, 90 k ehemmiyet veririm. Ve bu vasiflann 
kendimde mevcut oldugunu iddia edebilmek iqin milletimin de aym vasiflan ta§imasmi esas §art 
bilirim. Ben ya§abilmek iqin mutlaka bagimsiz bir milletin evladi kalmaliyim. Bu sebeple milli 
bagimsizlik bence bir hayat meselesidir. Millet ve memleketin menfaatleri icap ettirirse, insanligi 
te§kil eden milletlerden her biriyle medeniyet icabi olan dostluk ve siyaset miinasebetlerini bliylik bir 
hassasiyetle takdir ederim. Ancak, benim milletimi esir etmek isteyen herhangi bir milletin, bu ar- 
zusundan vazgeqinceye kadar, amansiz du§mamyim. 
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Table 2: Test times for our TTS Module 


The initialization time of the TTS Module: 0.131243 

Text No 

Text Nor¬ 
malization 

Selection 
of Logo- 
toms 

Creating 
the Wave 
File 

Audio 

Player 

Initializa¬ 

tion 

Total Syn¬ 
thesis 
Time 

Generated Audio File 
Time 

1 

0.000001 

0.000082 

0.000054 

0.000005 

0.000141 

0.850341 

2 

0.000001 

0.000140 

0.000089 

0.000003 

0.000232 

1.290039 

3 

0.000002 

0.000300 

0.000224 

0.000003 

0.000529 

3.228515 

4 

0.000002 

0.000515 

0.000336 

0.000003 

0.000855 

5.57373 

5 

0.000017 

0.003525 

0.002086 

0.000004 

0.005632 

39.115234 

6 

0.000010 

0.004233 

0.002174 

0.000004 

0.006421 

47.38208 

7 

0.000015 

0.007109 

0.003109 

0.000015 

0.010248 

80.46997 


As shown in Table 2, with the number 7 texts has 138 words and has been synthesized approximately 0.010248 
seconds totally. Based on this, if we assume that one book has approximately 500 words in each page, 
approximately 0.037130 seconds are needed to synthesize 500 words of one page. 

Text Normalization Process and the Challenges of Mathematical Notations 

In this study, one of the problem is faced during the synthesizing speech is Turkish non-text format mathematical 
notation and images. One of the processes to be done is text normalization before synthesizing speech, for example 
number of 269 as two hundred and sixty nine to read as. Normalization can be used for using of some mathematical 

3 .'7 

notations easily. Such as “% = percent”, “°C = Celsius degree”, ” = a cube”, ”* * = square root of seven ”. 

However, normalization of longer mathematical notation is more difficult. Therefore, creation of clear, 
understandable and simple standard is necessary. After a standard is created, training and promotion are required 
for visual impaired. As a result of these, long and complex mathematical notations will be understood in sound 
format by visual impaired easily. 

DISCUSSION AND RECOMMENDATIONS 

If a server has been installed in a library and the system is integrated with this server, each computer do not have to 
require individual large hard disks and by this way hardware requirements can be minimized. So that, many people 
might be able to use the system at the same time. 

Although speed and clarity of our developed speech synthesizer is enough, the need of natural speech synthesizer is 
great. Because the concentration and productivity of visual impaired people may fall in the face of monotonous 
speeches. However, studying of natural speech synthesis is still continuing and this is very difficult field of study. 

If the system is adapted over the internet, visual impaired do not need to come to the library for requesting a new 
book. And this way there can be more effective and easier usage. Especially, in the digital medium data is suitable 
for this usage. 

CONCLUSION 

In this study, a simple and working system was designed needed for a visually impaired person to make use of all 
the library facilities in Sakarya University. A visually impaired person can use the system with the help of sound 
orientation of the program and the keyboard commands of the user. When the specified requirements are provided, 
a system can easily be established in the library, so visually impaired people can benefit facilities of the library. 
Also a standard is needed for longer mathematical notations. 
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